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Abstract 

Background: The lung is critical in surveillance and initial defense against pathogens. In humans, as in mice, 
individual genetic differences strongly modulate pulmonary responses to infectious agents, severity of lung disease, 
and potential allergic reactions. In a first step towards understanding genetic predisposition and pulmonary 
molecular networks that underlie individual differences in disease vulnerability, we performed a global analysis of 
normative lung gene expression levels in inbred mouse strains and a large family of BXD strains that are widely 
used for systems genetics. Our goal is to provide a key community resource on the genetics of the normative lung 
transcriptome that can serve as a foundation for experimental analysis and allow predicting genetic predisposition 
and response to pathogens, allergens, and xenobiotics. 

Methods: Steady-state polyA+ mRNA levels were assayed across a diverse and fully genotyped panel of 57 
isogenic strains using the Affymetrix M430 2.0 array. Correlations of expression levels between genes were 
determined. Global expression QTL (eQTL) analysis and network covariance analysis was performed using tools and 
resources in GeneNetwork http://www.genenetwork.org. 

Results: Expression values were highly variable across strains and in many cases exhibited a high heri-tability 
factor. Several genes which showed a restricted expression to lung tissue were identified. Using correlations 
between gene expression values across all strains, we defined and extended memberships of several important 
molecular networks in the lung. Furthermore, we were able to extract signatures of immune cell subpopulations 
and characterize co-variation and shared genetic modulation. Known QTL regions for respiratory infection 
susceptibility were investigated and several as-eQTL genes were identified. Numerous c/'s- and trans-regulated 
transcripts and chromosomal intervals with strong regulatory activity were mapped. The Cyplol P450 transcript 
had a strong trans-acting eQTL (LOD 1 1.8) on Chr 12 at 36 ± 1 Mb. This interval contains the transcription factor 
Ahr that has a critical mis-sense allele in the DBA/2J haplotype and evidently modulates transcriptional activation 
by AhR. 

Conclusions: Large-scale gene expression analyses in genetic reference populations revealed lung-specific and 
immune-cell gene expression profiles and suggested specific gene regulatory interactions. 
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Background 

The lung is the first line of defense against many patho- 
gens and inhaled xenobiotics and is therefore a key part 
of the immune system. Host defense is strongly influ- 
enced by genetic differences and several studies have 
shown that the genetic background and sequence differ- 
ence among humans and other host species modulate 
susceptibility and resistance to infectious diseases, aller- 
gens, and xenobiotics. Systems genetics is a modern 
extension of complex trait analysis that jointly analyzes 
and integrates large sets of genotypes and phenotypes to 
explain and predict variation in outcome measures and 
disease severity (for review see [1,2]). A typical systems 
genetics study relies on extensive single nucleotide poly- 
morphism (SNP) data sets, matched data on RNA 
expression in key cells, tissues, or organs and a core set 
of key dependent measures such as disease susceptibility 
[3]. These data are collected across a panel or popula- 
tion of genetically diverse individuals or strains. This 
group of individuals represents a natural genetic pertur- 
bation, with well defined genotype and haplotype differ- 
ences comprising the "treatment." The independent 
measurements in this case can consist either of the gen- 
otype or of crucial intervening variables such as the 
expression of genes and proteins. 

In this study, we exploited a very well characterized 
panel of inbred strains of mice (a mouse genetic refer- 
ence panel) that consists of two parts-a small set of 
standard inbred strains and a larger family of BXD type 
recombinant inbred strains. The genome of each BXD 
strain represents a mixture of the C57BL/6J and DBA/2J 
parental background and is homozygous at almost every 
genomic location. The genomic make-up of each BXD 
line has been determined by extensive mapping with 
molecular markers. After performing microarray expres- 
sion analysis for each of the BXD mice, the expression 
level of each gene can be used as a quantitative trait (e. 
g. [4-6]). By comparing these expression values for all 
BXD mice with their molecular markers data along the 
genome, genomic expression quantitative trait loci 
(eQTL) can be identified that are likely to regulate the 
expression of one or several genes [2,5,7-12]. When an 
eQTL is located at the same genomic position as the 
gene itself (within a 10Mb interval of the gene) it is con- 
sidered as a ds-eQTL. In this case, variations in the pro- 
moter sequence or in elements that determine the 
stability of the mRNA of the gene are the most likely 
causes for the observed differences in expression levels. 
If the eQTL is at a distant location from the regulated 
gene, the eQTL is referred to as a trans-eQTL. 

Here, we performed a global gene expression analysis 
from the lungs of 47 BXD and eight widely used inbred 
strains. The aim of our study was to reveal genes and 
gene networks in mouse lung in steady state condition. 



We found that many genes had high variation in expres- 
sion and that often this variation was highly heritable. 
This allowed us to identify many cis- and trans-eQTLs. 
In addition, we used the correlation structure in the 
data to obtain expression signatures for specific cell 
types within the lung. 

Methods 

Mouse strains and sample preparation 

C57BL/6J, BALB/cByJ, FVB/NJ, and WSB/EiJ, as well as 
B6D2F1 and D2B6F1 lines were obtained from the Uni- 
versity of Tennessee Health Science Center (UTHSC). 
DBA/2J, 129Xl/SvJ, LP/J and SJL/J were obtained from 
The Jackson Laboratory. Mice from 38 BXD recombi- 
nant inbred strains were obtained from UTHSC and 
mice from nine BXD strains were obtained from The 
Jackson Laboratory. All animals were housed at UTHSC 
before sacrifice. Mice were killed by cervical dislocation 
and whole lungs including blood were removed and 
placed in RNAlater. Total RNA was extracted from the 
lungs using RNA STAT-60 (Tel-Test Inc.). RNA from 
two to five animals per strain were pooled and used for 
gene expression analysis. Animals used in this study 
were between 49 and 93 days of age. All inbred strains 
were profiled for both sexes, and for a given BXD strain 
either males or females were used. Mice were main- 
tained under specific pathogen free conditions. All pro- 
tocols involving mice were approved by the UTHSC 
Animal Care and Use Committee. 

Microarray analysis 

Gene expression profiling was performed using Affyme- 
trix GeneChip Mouse Genome 430 2.0 Arrays at 
UTHSC. Samples were amplified according to the 
recommended protocols by the manufacturer (Affyme- 
trix, Santa Clara, CA, USA). In all cases, 4-5 (ig of each 
biotinylated cRNA preparation was fragmented and 
included in a hybridization cocktail containing four bio- 
tinylated hybridization controls (BioB, BioC, BioD, and 
Cre), as recommended by the manufacturer. Samples 
were hybridized for 16 hours. After hybridization, Gene- 
Chips were washed, stained with SAPE, and read using 
an Affymetrix GeneChip fluidic station and scanner. 

Data preprocessing and analysis 

Data analysis was performed using the GeneNetwork 
web service [13], a large resource with phenotypes and 
mRNA expression data for several genetic reference 
populations and multiple organisms. The expression 
data were preprocessed like all other datasets in Gene- 
Network: adding an offset of 1 unit to each signal inten- 
sity value to ensure that the logarithm of all values were 
positive, computing the log 2 value, performing a quan- 
tile normalization of the log 2 values for the total set of 
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arrays using the same initial steps used by the RMA 
transform [14], computing the Z scores for each cell 
value, multiplying all Z scores by 2 and adding 8 to the 
value of all Z scores. The advantage of this variant of a 
Z transformation is that all values are positive and that 
1 unit represents approximately a 2-fold difference in 
expression as determined using the spike-in control 
probe sets (see [8] for details). For correlation analyses 
we used Pearson's correlation unless otherwise stated. 
Heritability was determined using ANOVA with one 
factor mouse strain, and by dividing the mean between- 
mouse-strain variance by the sum of the mean between- 
mouse strain variance plus the mean within-strain 
variance. 

QTL Mapping and expression analyses 

All probe sets were mapped using standard interval 
mapping methods at 1 cM intervals (~2 Mb) along all 
autosomes and the X chromosome. This procedure gen- 
erates estimates of linkage between variation in tran- 
script expression levels and chromosomal location. The 
entire set of values can be used to construct a set of 
QTL maps for all chromosomes (except Chr Y and the 
mitochondrial genome) in which position is plotted on 
the x-axis and the strength of linkage-the likelihood 
ratio statistic (LRS) or log of the odds ratio (LOD)-is 
plotted on the y-axis. An LRS of 18 or higher is signifi- 
cant at a genome-wide p value of < 0.5. To compute 
LRS values we exploited the computationally efficient 
Haley-Knott regression equations [15] and a set of 3796 
SNPs and microsatellite markers that we and others 
have genotyped over the past decade [16,17]. In order to 
rapidly map all 45,101 probe sets we used our custo- 
mized QTL Reaper code http://qtlreaper.sourceforge. 
net/. QTL Reaper performs up to a million permuta- 
tions of an expression trait to calculate the genome- 
wide empirical p value and the LRS scores associated 
with each interval or marker. The peak linkage value 
and position was databased in GeneNetwork and users 
can rapidly retrieve and view these mapping results for 
any probe set. Any of the QTL maps can also be rapidly 
regenerated using the same Haley-Knott methods, again 
using functions imbedded in GeneNetwork. GeneNet- 
work also enable a search for epistatic interactions (pair 
scanning function) and composite interval mapping with 
control for a single marker. 

Data quality control 

We used two simple but effective methods to confirm 
correct sample identification of all data entered into 
GeneNetwork. Expression of the Xist transcript (probe 
set 1427262_at) was used to validate the sex of the sam- 
ple. Xist is involved in the inactivation of one X chro- 
mosome in females [18] and is only expressed at high 



levels in females. Other genes that show strong sex-spe- 
cific expression are Eif2s3y, Jaridld and Ddx3y. In addi- 
tion, we investigated several genes that exhibit a 
strongly bimodal Mendelian expression pattern, meaning 
that one parental allele exhibits a high expression level 
whereas the other allele exhibits a low expression and 
only the Fl hybrids are intermediate. The expression 
level of such transcripts is directly correlated with the 
genotype at this locus and they can collectively be used 
to confirm sample genotype and hence strain. For exam- 
ple, expression of the Rpgripl transcript (probe set 
1421144_at) has a distinctly bimodal distribution, inter- 
mediate values for Fl animals, and is associated with a 
LOD score peak of 50 that corresponds precisely to the 
location of the cognate gene on Chr 14 at 52.5 Mb. 

Results 

Variation in gene expression 

The Affymetrix M430 2.0 array that we used includes 
45,101 probe sets and provides consensus estimates of 
expression for the vast majority of all protein coding 
genes. Table 1 gives an overview of the range of varia- 
tion across strains in each of the probe sets used. Strik- 
ingly, more than 2,000 genes showed a range of 
expression that was larger than four-fold different 
between the strain with the lowest and the highest 
expression. Among the genes with the most extreme 
range in expression levels were Krt4 (keratin 4), Krtl3 
(keratin 13) and Krtdap (keratinocyte differentiation 
associated protein). Another gene with highly variable 
expression was Cftr (cystic fibrosis transmembrane con- 
ductance regulator homolog). This important lung dis- 
ease-causing gene showed a four-fold variation in 
expression levels between strains. Several other genes 
with high variation were sex-specifically expressed 
genes, like Xist (inactive X specific transcripts), Ddx3y 
(DEAD (Asp-Glu-Ala-Asp) box polypeptide 3, Y-linked) 
and Serpinalb (serine (or cysteine) preptidase inhibitor, 
clade A, member IB). 



Table 1 Variation in gene expression for 45,101 probe 
sets. 



Fold change range 


Log 2 range 


No. of genes 


1-2 


0-1 


30,392 


2-4 


1-2 


11,965 


4-8 


2-3 


1,980 


8-16 


3-4 


498 


16-32 


4-5 


132 


32-64 


5-6 


60 


64-inf 


6-inf 


42 



Fold changes between the lowest and highest expressed mouse stains per 
probe sets were cal-culated and divided in seven bins. The corresponding 
range on log 2 scale and the amount of genes in each range are given. 
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Heritability of variation in gene expression 

To investigate to which extent the variation in expres- 
sion was due to genetic effects, we calculated the herit- 
ability for each of the genes, which is the fraction of 
variation in expression caused by genetics. The heritabil- 
ity values ranged from as high as 0.96 (most of the var- 
iance was associated with between-strain differences) 
until as low as 0.01. Genes with the largest heritability 
were Cdkl7/Pctk2 (cyclin-dependent kinase 17, probe 
set 1446130_at), Gml337 (predicted gene 1337, 
1443287_at) and Pdxdcl/KIAA0251 (pyridoxal-depen- 
dent decarboxylase domain containing 1, 1452705_at), 
all having a value above 0.99. High heritability values 
indicate that it is likely to successfully map QTLs that 
influence gene expression values. 



Lung-specific genes 

The large dataset in GeneNetwork and its built-in fea- 
tures allowed us to compare the gene ex-pression pat- 
terns in the lung with data from 25 other tissues. First, 
we identified the most highly expressed genes in lung 
(Table 2 lists the 15 highest expression signals). Two of 
these genes were highly restricted to the lung and tra- 
chea: Sftpc (surfactant associated protein C) and Ager 
(advanced glycosylation end product-specific receptor) 
(Figure 1A, B) whereas Scgblal (secretoglobin, family 
1A, member 1 (uteroglobin)) was highly expressed in 
lung but also showed expression in some other tissues 
(Figure 1C). On the other hand, Hba-al (hemoglobin 
alpha, adult chain 1) was expressed at high levels in 
most tissues (Figure ID). We then used Stfpc in a tissue 
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Table 2 List of 15 probe sets with highest expression signals in the lung 



Probe set 


Symbol 


Description 


Location 
(Chr, Mb) 


Mean 
Expr 


Tissue-specific 
expression 


1 42836 1_x_at 


Hba-a 1 


i i i * ii II, i • -i 

hemoglobin alpha, adult chain 1 


Chr1 1: 
32.184441 


15,10 


MT 


1418639_at 


Sftpc 


surfactant associated protein C 


Chr14: 
70.920826 


14,92 


LS 


1452543_a_at 


Scgblal 


secretoglobin, family 1A, member 1 (uteroglobin) 


Chr19: 
9.158206 


14,78 


LHOT 


1417184_s_at 


Hbb-b2 


hemoglobin, beta adult minor chain 


Chr7: 

110.976103 


14,74 


MT 


1441958_s_at 


Ager 


advanced glycosylation end product-specific receptor 


Chr17: 
34.737745 


14,69 


LS 


AFFX-b- 
ActinMur/ 
M12481 
_3_at 


Actb 


actin beta, cytoplasmic 


Chr5: 

143.665528 


14,67 


MT 


1452757_s_at 


Hbo-o 1 


hemoglobin alpha, adult chain 1 


Chr1 1: 
32.196742 


14,66 


MT 


1416642_a_at 


Tptl 


tumor protein, translationally-controlled 1 


Chr14: 
76.246098 


14,62 


MT 


1418509_at 


Cbr2 


carbonyl reductase 2 


Chr1 1 : 
120.628111 


14,62 


LHOT 


1436996_x_at 


Lzp-s 


P lysozyme structural and lysozyme 


Chr10: 
116.724902 


14,62 


ND 


I I UUZH a al 


Uba52 


UUILjUIUII AA JZ IColUUtr 1 IUUoUI 1 Icll |JIUlt:lll lUolUII |JIUUULl 1 


Chr8 - 
73.032191 


14 58 


MT 


1 42702 1_s_at 


Fthl 


ferritin heavy chain 1 


Chr19: 
10.057382 


14,57 


ND 


AFFX- 

MURINE_B2_at 


B2 


AFFX-MURINE_B2_at short interspersed nuclear element (SINE) class of repeat 
(probes target Chr 1 and Chr 2 most heavily) 


N/A 


14,52 


ND 


1415906_at 


Tmsb4x 


thymosin, beta 4, X chromosome 


ChrX: 

163.645132 


14,51 


MT 


1449436_s_at 


Ubb 


ubiquitin B 


Chrl 1: 
62.366564 


14,50 


MT 



Mean Expr: mean expression in lung for BXD strains. LS: lung specific expression, LHOT: highly expressed in lung but also in other tissues, MT: expressed in many 
tissues or mainly in non-lung tissues, ND: no data for other tissues than lung available. 



correlation analysis to identify other genes that may not 
be as highly expressed but still be restricted to lung tis- 
sue. The first 70 probe sets found were then analyzed as 
above for lung-specific expression, and 15 genes were 
identified (Table 3). A comparison to the expression 
patterns described in the BioGPS database [19] con- 
firmed that the majority was only expressed in lung, 
most of them at high level. Two genes were not 
restricted to the lung according to BioGPS, and five 
genes were also found at lower levels in one other tissue 
(Table 3). 

Identification of gene networks using correlations 

The large data set for expression values for -39,000 
transcripts in 57 mouse strains allowed us to calculate 
correlations between any pair of genes. A Spearman 
rank correlation analysis identified 12,985 pairs of genes 
with a correlation value above 0.8, and 604 pairs showed 
a correlation value of 0.9 or higher. For example, the 
expression of Klra3 (killer cell lectin-like receptor 



subfamilily A, member 3) was strongly correlated with 
the expression of Gzma (granzyme A) (Figure 2A). 
Klra3 also appeared to be strongly correlated with 
Ill8rap (interleukin 18 receptor accessory protein, 
Figure 2B). We then calculated the first principal com- 
ponent of the Klra3, Gzma, Ill8rap and Klrgl (killer cell 
lectin-like receptor subfamily G, member 1) genes and 
used it to determine the correlations with all other 
genes in the lung data set. In this way, we could identify 
a network of nine genes exhibiting a correlation of >= 
0.8 with this principal component (Figure 2C). One of 
the newly identified genes was Prfl (perforin 1) which 
was correlated with a p-value of < 10" 16 with the princi- 
pal component (Figure 2D). If genes exhibit a strong 
correlation of their expression values, one may hypothe- 
size that they are involved in the same biological process 
or pathway, or they may be expressed in the same cell 
type. 

In a similar way, we identified another gene network 
of 20 genes that exhibited very high correlations of their 
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Table 3 List of genes with lung-restricted expression found by tissue correlation analysis with Sftpc 



Probe set 


Symbol 


Description 


Location 
(Chr, Mb) 


Mean 
Expr 


BioGPS expression 


1418639_ 


_at 


Sftpc 


surfactant associated protein C 


Chr14: 
70.920826 


14,92 


high in lung, low in nucleus 
accumbens 


1437028_ 


_at 


Sftpb 


surfactant associated protein B (nonciliated bronchiolar and 
alveolar type 2 cell signature) 


Chr6: 
72.260763 


13,68 


high in lung only 


1422334_ 


_a_at 


Sftpa 1 


surfactant associated protein A1 


Chr14: 
41.946994 


14,24 


high in lung only 


1422346_ 


.at 


Nkx2-1 
(Titfl) 


thyroid transcription factor 1 


Chr12: 
57.634187 


8,07 


lung only 


1417057_ 


_a_at 


Lamp3 


lysosomal-associated membrane protein 3 


Chr16: 
19.653875 


12,09 


high in lung, low in ES cells and 
some cell lines 


1421404_ 


.at 


Cxcl15 


chemokine (C-X-C motif) ligand 15 


Chr5: 
91.230349 


13,87 


high in lung only 


1441958_ 


_s_at 


Ager 


advanced glycosylation end product-specific receptor 


Chr17: 
34.737745 


14,69 


high in lung only 


1436787_ 


_x_at 


Sec14l3 


SEC14-like protein 3 


Chrl 1: 
3.978573 


13,21 


only data for human available - 
not lung specific 


1 42521 8_ 


_a_at 


Scgb3a2 


secretoglobin, family 3A, member 2 


Chr18: 
43.924081 


14,17 


high in lung only 


1449428_ 


.at 


Cldn18 


claudin 18 


Chr9: 
99.591247 


12,70 


highest in lung, lower in stomach 


1449525_ 


.at 


Fmo3 


flavin containing monooxygenase 3 


Chrl: 

164.884088 


10,90 


high in lung, maybe weak in 
some other tissues 


1 42581 4_ 


_a_at 


Colcrl 


calcitonin receptor-like 


Chr2: 
84.170818 


12,91 


high in lung, weak in 
macrophages 


1421373_ 


.at 


Cox4i2 


cytochrome c oxidase subunit IV isoform 2 


Chr2: 

152.582819 


9,24 


not specific for lung 


1419699_ 


.at 


Scgb3o 1 


secretoglobin, family 3A, member 1 


Chrl 1: 
49.477871 


13,68 


high in lung only 


1451604_ 


_a_at 


Acvrll 


activin A receptor, type ll-like 1 


Chr15: 
100.968668 


11,86 


high in lung only 


1420347_ 


.at 


Plunc 


palate, lung, and nasal epithelium carcinoma associated 


Chr2: 

153.973359 


13,42 


high in lung, low in heart 



Mean Expr: mean expression in lung for BXD strains. BioGPS: evaluation of expression pattern as described in BioGPS. 



expression levels across all mouse strains. All possible 
pairs of genes in this network showed a correlation 
above 0.95 (Figure 3). The network contained two kera- 
tin genes, Krt4 (keratin 4) and Krtl3 (keratin 13) and 
genes involved in cytoskeleton functions, again pointing 
to a possible interaction of these genes in the same 
pathway or biological process. Further gene networks 
found by correlation studies were related to B and T 
cells (see below). 

Correlation analysis identified gene expression signatures 
for T and B cells 

The hemoglobin genes Hba-al (hemoglobin alpha, adult 
chain 1) and Hbb-b2 (hemoglobin beta, adult minor 
chain) were among the top 10 genes with highest 
expression values in our lung data set. The high levels 
of hemoglobin transcripts suggested that circulating 
blood cells, including immune cells, may also be ana- 
lyzed in our data set. Therefore, we investigated the 
gene expression networks of known immune cell 



markers, e.g. Cd3 genes as specific markers for T cells. 
We calculated the correlations of Cd3d (Cd3 antigen, 
delta polypeptide) expression levels over all BXD lines 
with all other genes. This analysis revealed 20 genes 
with a very highly correlated expression value (p-value 
below 10~ 14 , Figure 4). Most of these genes were known 
T cell markers or involved in T cell regulation. Eight 
out of the 12 genes with the strongest correlations were 
also exclusively expressed in T cells according to the 
BioGPS database (Wu et al, 2009): Cd3d, Itk, Tcrb-13V 
Cd3e, Cd3g, Scapl, Cd6 and CdS (see Figure 4 for full 
gene names). Similarly, we searched for B cell-specific 
signatures starting with the B cell marker gene Cdl9 
(CD19 antigen). The probe set "1450570_a_at" detected 
Cdl9 mRNA levels and showed a mean expression level 
of 9.3. We found 14 probe sets with a correlation above 
0.80 (p-value < 10" 14 , Figure 5). A comparison with the 
BioGPS database revealed that eight of them, Cdl9, 
Cd79b, Faim3, Cd79a, Elk, B3gnt5, Cd22 and Blrl (see 
Figure 5 for full gene names) were also exclusively 
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Klra3 



Klra3 



r = -0.856 P< 1.00 E-16 



Record 
ID 


Symbol Description 
M 


Location 
Chr and Mb 
■■ 


Mean 
Expr 


Sample 
P(r) 


1417898_a_at 


Gzma granzymeA 


Chrl3: 113.884077 


11.315 


0.00e+00 


1420788_at 


„, - killer cell lectin-like receptor subfamily G, 
K,rgl member 1 


Chr6: 122.220647 


8.325 


0.00e+00 


1425436_x_at 


- killer cell lectin-like receptor, subfamily A, 
member 3; mid-distal 3' UTR 


Chr6: 130.273398 


9.492 


0.00e+00 


1456545_at 
1451862_a_at 


Ill8rap interleukin 18 receptor accessory protein 
Prfl perforin 1 (pore forming protein) 


Chrl: 40.607957 
ChrlO: 60.766209 


8.743 
8.125 


0.00e+00 
0.00e+00 


144936 l_at 


Tbx21 T- box 21 


Chrll: 96.959401 


7.635 


0.00e+00 


1418126_at 


- Tchemokine (C-C motif) ligand 5; last two 
exons and proximal 3' UTR 


Chrll: 83.339355 


10.747 


0.00e+00 


1443937_at 
14492 3 5_at 


H2rb interleukin 2 receptor, beta chain 

Fast Fas ligand (TNF superfamily, member 6) 


Chrl5: 78.309800 
Chrl: 163.711114 


9.010 
6.728 


8.88e-16 
2.62e-14 




first principal component 

Figure 2 Expression signals for strongly correlated genes in the lung. Numbers indicate BXD strains, and the parental C57BL/6J and DBA/2J 
strains as well as F1 individuals are presented. Expression signals of (A) Klro3 and Gzma (p < 10" 16 ) and (B) Klro3 versus IH8rop (p < 10" 11 ) were 
strongly correlated. (C) List of nine genes highly correlated with the first principal component of the expression of Gzma, Klrgl, Klra3 and H18rap. 
(D) Strong correlation between the first principal component and Prfl (p < 10" 16 ). X- and Y-axis of the plots show the names of genes used for 
the analysis. 



expressed in B cells. Therefore, these genes can be con- 
sidered as T and B cell signature genes which may be 
used to follow the presence and infiltration of T and B 
cells in the lung under normal and pathological 
conditions. 

Identification of candidate genes regulating phenotypic 
traits in the lung 

Once a QTL for a phenotypic trait has been found, it 
will be important to identify the underlying quantitative 
gene (QTG) which is causing the variation. Searching 
ds-eQTLs in the QTL interval represents one suitable 
approach [8]. As a prototype for this approach in our 
lung data set, we examined two traits for which lung 
phenotypes were studied in the BXD population and 



which were available in GeneNetwork. Boon et al. [20] 
described several QTLs for the susceptibility of BXD 
mice to influenza A infections. We analyzed one signifi- 
cant QTL peak on chromosome 2 and two suggestive 
peaks on chromosomes 7 and 17. Seven ds-eQTL regu- 
lated genes were found in the chromosome 2 QTL 
interval (Table 4), including the He (hemolytic comple- 
ment) gene which was shown to contribute to influenza 
susceptibility [20]. The analysis of the QTL region on 
chromosome 7 revealed 12 ds-regulated genes in the 
lung, including Triml2 (tripartite motif protein 12) and 
Trim34 (tripartite motif protein 34) which were also 
described as potential candidate QTGs by [20]. In the 
chromosome 17 QTL region, we found 17 ds-eQTL 
genes, of which Prkcn (protein kinase C, nu), Qpct 
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0.969 
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0.950 
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0.997 
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51 
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Symbol: Krt4 
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0.943 
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51 


Trait 4: HZI 0408 R::1434227 at 
Symbol: Krtdap 
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0.988 


5" 
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0.918 


0.923 


0.942 


0.929 
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0.934 


0.936 
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0.935 


0.929 
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0.916 
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51 
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0.991 


0.986 


0.987 


51 
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0.966 
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0.824 
51 
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Symbol: 231000~2A05Rik 
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51 


0.980 
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0.968 
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0.926 


0.964 


0.921 
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0.956 


0.911 


0.948 


0.854 
51 


Trait 7: HZI 0408 R::1453092 at 
Symbol: 230000~2G24Rik 


0.985 


0.985 


0.978 


0.982 


0.981 


0.993 


51 


0.908 


0.972 


0.956 


0.949 


0.907 


0.945 


0.976 
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0.945 


0.934 


0.953 


0.872 
51 


Trait 8: HZI 0408 R::1450645 at 
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0.981 
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0.974 
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0.974 
51 


0.974 
51 
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0.975 
51 
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Figure 3 The cytokeratin network. Pearson correlations (listed below the diagonal) showed very high correlations between all pairs of the 20 

selected genes. Spearman Rank correlations are shown above the diagonal, 
k J 



(glutaminyl-peptide cyclo£ransferase (glutaminyl cyclase) 
and Mta3 (metastasis associated 3) were suggested as 
potential QTGs by [20]. Another lung-specific pheno- 
type in the GeneNetwork database is "Mycoplasmosis 
susceptibility, alveolar exudate" (GeneNetwork ID 



10692, [21] and Cartner et al. unpublished). This trait 
showed a significant QTL on chromosome 10, between 
105 and 130 Mb. The analysis of our lung expression 
data revealed 16 c/s-eQTLs in the genomic interval 
(Table 5). Three of the c/s-QTL genes have been 



Record 

ID 

M 


Symbol 


Description 


Location 
Chr and Mb 
M 


Mean 
Expr 


Max 
LRS 


Max LRS Location 
Chr and Mb 
M 


Sample N Sample 
r Cases p(r) 
:■■ Wm 


Lit 
Corr 


Tissue Tissue 
r p(r) 


1 □ 


1422828_at Cd3d 


CD3 antigen, delta polypeptide 


Chr9: 44.789979 


9.770 


13.2 Chr7: 81.491656 


1.000 


51 


0.00e+00 


1.000 


1.000 


N/A 


2G 


1452405_x_at Wcra 


T-cell receptor alpha chain 


Chrl4: 54.842731 


9.754 


12.7 Chrll: 103.578807 


0.917 


51 


0.00e+00 


0.711 


0.798 


0.000 


3D 


1425396_a_at 


Lck 


lymphocyte protein tyrosine kinase 


Chr4: 129.225782 


9.936 
8.208 
9.767 


11.9 Chr7: 81.491656 


0.907 


51 


0.00e+00 


0.610 


0.909 


0.000 


40 


1417171_at 


Itk 


IL2-inducible T-cell kinase; mid distal 3' UTR 


Chrll: 46.138692 


11.3 


Chr7: 81.491656 


0.894 


51 


0.00e+00 


0.619 


0.883 


0.000 


sQ 


1425854_x_at 


Tcrb-V13 


T-cell receptor beta, variable 13 


Chr6: 41.496833 


12.0 


Chr7: 85.847907 


0.879 


51 


0.00e+00 


0.726 


0.882 


0.000 


60 


1422105_at 


Cd3e 


CD3 antigen, epsilon polypeptide; distal 3' UTR 


Chr9: 44.806844 


9.656 


8.6 


Chr7: 81.491656 


0.872 


51 


0.00e+00 


0.924 


0.862 


0.000 


7Q 


1426159_x_at 


Tcrb-V13 


T-cell receptor beta, variable 13 


Chr6: 41.488285 


10.520 


10.8 


Chr7: 85.847907 


0.865 


51 


0.00e+00 


0.726 


0.882 


0.000 


bQ 


1426113_x_at 


Tcra 


T-cell receptor alpha chain 


Chrl4: 54.843469 


10.140 


10.6 


Chrll: 101.112194 


0.864 


51 


0.00e+00 


N/A 


0.798 


0.000 


90 


1419178_at 


Cd3g 


CD3 antigen, gamma polypeptide 


Chr9: 44.777916 


8.820 


11.5 


Chr9: 80.917762 


0.863 


51 


0.00e+00 


0.976 


0.844 


0.000 


10 □ 


1452205_x_at 


Tcrb-V13 


T-cell receptor beta, variable 13 


Chr6: 41.488510 


9.520 


12.5 


Chr7: 85.847907 


0.861 


51 


0.00e+00 


0.726 


0.882 


0.000 


11 □ 


1425226_x_at 


Tcrb-V13 


T-cell receptor beta, variable 13 


Chr6: 41.488321 


10.198 


11.1 


Chr7: 85.847907 


0.860 


51 


0.00e+00 


0.726 


0.882 


0.000 


12 □ 


1416107_at 


Hm P 19 


HMP19 protein, neuron specific gene family 
member 2 (hypothalamus golgi apparatus 
expressed 19 kDa protein, dopamine receptor 
binding); distal 3' UTR 


Chrll: 31.958642 


8.400 


13.0 


Chr7: 81.491656 


0.850 


51 


0.00e+00 


0.306 


N/A 


N/A 


13 □ 


146065 l_at 


Lat 


linker for activation of T cells 


Chr7: 133.507917 


9.211 


14.0 


Chr7: 90.186486 


0.839 


51 


0.00e+00 


0.667 


0.878 


0.000 


14 Q 


1426772_x_at 


Tcrb-V13 


T-cell receptor beta, variable 13 


Chr6: 41.496891 


10.006 


10.1 


Chr7: 85.847907 


0.837 


51 


0.00e+00 


0.726 


0.882 


0.000 


15 □ 


1434295_at 


Rasgrpl 


RAS guanyl releasing protein 1; distal 3' UTR 


Chr2: 117.105846 


8.530 


11.5 


Chrl2: 101.866283 


0.833 


51 


2.22e-16 


0.609 


0.452 


0.020 


16 Q 


1437249_at 


Scapl 


sre family associated phosphoprotein 1; last three 
exons 


Chrll: 96.592455 


8.072 


14.2 


Chr7: 89.127385 


0.822 


51 


6.66e-16 


0.559 


N/A 


N/A 


17 □ 


1435227_at 


Bell lb 


B-cell leukemia/lymphoma 11B; distal 3' UTR 


Chrl2: 109.150912 


8.565 


10.1 


Chr7: 81.491656 


0.818 


51 


1.55e-15 


0.580 


0.413 


0.036 


18 Q 


1451910_a_at 


Cd6 


CD6 antigen 


Chrl9: 10.864137 


7.856 


9.3 


Chr7: 125.263073 


0.815 


51 


2.66e-15 


0.733 


0.864 


0.000 


19 Q 


1438392_at 


48334 13CllRik 


4833413GllRik (Cd247 antigen-associated); 3' 
UTR (in Cd3z intron 1) 


Chrl: 167.735938 


7.114 


10.5 


Chr7: 81.491656 


0.815 


51 


2.66e-15 


N/A 


N/A 


N/A 


20 □ 


1418353_at 


CdS 


CD5 antigen 


Chrl9: 10.792689 


7.984 


11.3 


Chrl2: 118.628399 


0.805 


51 


1.27e-14 


0.789 


0.708 


0.000 



Figure 4 Gene signatures for T-cells. List of the strongest correlates for Cd3d (probe set 1422828_at), all correlated at p < 10" 13 . 
I J 
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Record 
ID 


Symbol 


Description 
M 


Location 
Chrand Mb 


Mean 
Expr 


Max 
LRS 


Max LRS Location 
Chrand Mb 


Sample 
r 


N Sample 
Cases p(r) 
■a \*m 


Lit 
Corr 


Tissue 
r 


Tissue 
P(r) 


1 □ 


1450570_a_at Cdl9 


CD19 antigen 


Chr7: 133.551983 9.343 


10.1 


Chrl6: 73.748201 


1.000 


51 


0.00e+00 


1.000 




N/A 


2Q 


1417640_at 


Cd79b 


CD79B antigen 


Chrll: 106.172725 


10.376 


11.8 
12.6 
15.4 


Chr5: 133.538653 


0.917 


51 


0.00e+00 


0.811 


0.914 


0.000 


30 


1429889_at 


Faim3 


Fas apoptotic inhibitory molecule 3 


Chrl: 132.774829 


9.773 


Chr6: 88.537080 


0.909 


51 


0.00e+00 


0.462 


0.887 


0.000 


4D 


1418830_at 


Cd79a 


CD79A antigen (immunoglobulin-associated alpha) 


Chrl2: 53.180027 


9.950 


ChrlO: 13.407601 


0.891 


51 


0.00e+00 


0.858 


0.961 


0.000 


sD 


1422775_at 


Blk 


B lymphoid kinase (oncogene); last exon and proximal half of 3' UTR 


Chrl4: 63.991841 


8.724 


13.1 


Chr6: 88.537080 


0.865 


51 


0.00e+00 


0.713 


0.928 


0.000 


60 


1420994_at 


B3gnt5 


UDP-GlcNAc:betaGal beta-l,3-N-acetylglucosaminyltransferase 5; 
distal and far 3' UTR 


Chrl6: 19.771998 


8.785 


12.8 


ChrlO: 12.729006 


0.847 


51 


0.00e+00 


0.353 


0.514 


0.007 


7 Q 


1419768 at 




CD22 antigen 


Chr7: 31.650937 




15.8 


Chr7: 31.950330 


0.846 


51 


0.00e+00 


0.869 




0.000 


8 Q 


1419907_s_at 


Fcrla 


Fc receptor-like Ajexpressed sequence BB219290 


Chrl: 172.847803 


7.411 


16.0 


Chrl5: 91.723348 


0.837 


51 


0.00e+00 


N/A 


0.963 


0.000 


9D 


1422003_at 


Blrl 


Burkitt lymphoma receptor 1 


Chr9: 44.320024 


8.601 


13.1 


ChnS: 88.537080 


0.834 


51 


0.00e+00 


0.663 


0.895 


0.000 


10 □ 


1419206_at 


Cd37 


CD37 antigen 


Chr7: 52.489289 


10.082 


11.3 


ChrlO: 13.407601 


0.829 


51 


2.22e-16 


0.634 


0.871 


0.000 


U □ 


1460407_at 


Splb 


Spi-B transcription factor (Spi-l/PU.l related) 


Chr7: 51.781391 


9.152 


10.4 


Chrl: 108.290874 


0.829 


51 


2.22e-16 


0.716 


0.967 


0.000 


12 Q 


1423182_at 


Tnfrsfl3b 


tumor necrosis factor receptor superfamily, member 13b 


Chrll: 60.962284 


9.318 


20.4 


ChnS: 88.537080 


0.812 


51 


4.44e-15 


0.747 


0.792 


0.000 


13 □ 


14604 19_a_at 


Prkcb 


protein kinase C, beta; distal 3' UTR 


Chr7: 129.777363 


9.675 


13.2 


Chr5: 147.658807 


0.805 


51 


1.38e-14 


0.432 


N/A 


N/A 


14 □ 


1456632_at 


Bel 11a 


B-cell CLL/lymphoma 11A (zinc finger protein); distal 3' UTR or last 
intron 


Chrll: 24.066781 


7.033 


12.3 


ChrlO: 12.729006 


0.802 


51 


2.00e-14 


0.563 


0.648 


0.000 


15 □ 


1425736_at 


Cd37 


CD37 antigen 


Chr7: 52.493149 


9.102 


15.2 


ChrlO: 12.729006 


0.796 


51 


4.71e-14 


0.634 


0.871 


0.000 


16 □ 


1419307_at 


Tnfrsfl3c 


tumor necrosis factor receptor superfamily, member 13c 


Chrl5: 82.052242 


8.329 


10.1 


ChnS: 91.144186 


0.796 


51 


5.20e-14 


0.804 


0.970 


0.000 


17 □ 


1456328_at 


A530094C12Rik 


RIKEN cDNA A530094C12 gene 


Chr3: 135.716431 


9.356 


10.7 


Chrl5: 70.982535 


0.789 


51 


1.27e-13 


0.651 


N/A 


N/A 


18 □ 


1419406_a_at 


Bel 11a 


B-cell CLLyiymphoma 11A (zinc finger protein) 


Chrll: 24.072849 


7.832 


12.8 
10.2 


Chrl5: 95.144975 


0.788 


51 


1.56e-13 


0.563 


0.648 


0.000 


19 □ 


1438995_at 


Panx3 


pannexin 3 


Chrl3: 24.747062 


8.779 


Chrl5: 95.144975 


0.783 


51 


2.85e-13 


0.222 


0.056 


0.786 


20 _ 


1423478_at 


Prkcbl 


protein kinase C, beta 1; mid 3' UTR 


Chr7: 129.771598 


7.297 


13.2 


Chr5: 147.658807 


0.781 


51 


3.74e-13 


0.432 


0.431 


0.028 



Figure 5 Gene signatures for B-cells. List of the strongest correlates for Cd19 (probe set 1450570_at), all correlated at p < 10" 12 . 

V J 



associated previously with immune functions and thus 
represent suitable candidates to regulate this trait: Chst 
(carbohydrate (keratan sulfate Gal-6) sulfotransferase 1) 
was found to exhibit a critical role in lymphocyte traf- 
ficking during chronic inflammation [22], The transcrip- 
tion factor Maf (avian musculoaponeurotic fibrosarcoma 
(v-maf) AS42 oncogene homolog) was shown to play a 
role in the transcriptional regulation of cytokine expres- 
sion and immune cell markers, e.g. [23-29]. Nrpl (neu- 
ropilin 1) has been primarily described as neuronal 
receptor but appears also to play a role in the primary 
immune response and formation of the immunological 
synapse [30,31]. 

Cis- and trans-eQTLs 

We then performed a search for eQTLs on a global 
level, for all probe sets. In this analysis 5,214 cis- and 
15,485 £rans-regulated genes were identified at an LRS 
threshold of 12 (Table 6 and Figure 6). When the LRS 
threshold was increased to 50, 1,332 ds-regulated genes 
were found, whereas the number of £rans-regulated 
genes was reduced to 15. This observation indicates that 



many of the trans-eQTL showed a much lower signifi- 
cance value than the ds-eQTL. Next, we present exam- 
ples for one cis- and one trans-eQTL. A strong eQTL 
was detected on chromosome 14, at 52 megabases (Mb; 
Figure 7B) regulating the expression levels of Ang 
(angiogenin, ribonuclease, RNase A family, 5) (Figure 
7A). Since Ang is located at the same position as the 
eQTL (51.7 Mb on chromosome 14) it represents a cis- 
eQTL. Furthermore, a strong eQTL was found on chro- 
mosome 12 regulating the expression levels of the 
Cyplal gene (cytochrome P450, family 1, subfamily a, 
polypeptide 1) (Figure 7C,D). Cyplal is located on 
chromosome 9 and the corresponding eQTL was found 
on chromosome 12 (trans-eQTL). The eQTL signifi- 
cance interval contained nine genes, four of which were 
expressed in lung at a level above 10. Ahr (aryl-hydro- 
carbon receptor) was one of the four genes and was at 
the top of the QTL peak (Figure 8). It is the most likely 
candidate for Cyplal regulation. In conclusion, our data 
set contained a large number of genes whose expression 
levels are likely to be influenced by allelic variations in 
the genomes of C57BL/6J and DBA/2J. Therefore, the 



Table 4 C/s-eQTLs identified in QTL inteval on chromosome 2 for influenza susceptibility 



Probe set 


Symbol 


Description 


Location (Chr, Mb) 


Mean Expr 


Max LRS 


1423602_at 


Trofl 


Tnf receptor-associated factor 1 


Chr2: 34.798805 


9,28 


21,1 


1419407_at 


He 


hemolytic complement 


Chr2: 34.838908 


12,00 


82,7 


1441635_at 


Nr6a1 


nuclear receptor subfamily 6, group A, member 1 


Chr2: 38.736451 


7,51 


20,2 


1455743_at 


Olfml2a 


olfactomedin-like 2A 


Chr2: 38.816929 


8,28 


63,2 


1430379_at 


Zfhxlb 


zinc finger homeobox 1b 


Chr2: 44.931019 


9,08 


82,4 


1 4385 1 6_at 


Rifl 


Rap1 interacting factor 1 


Chr2: 51.975068 


8,03 


38,8 


1444530_at 


Neb 


nebulin 


Chr2: 51.991339 


8,14 


86,9 



For each gene, only the highest LRS is shown. Mean Expr: mean expression in lung for BXD strains. 
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Table 5 C/s-eQTLs identified in QTL on chromosome 8 for Mycoplasmosis susceptibility trait 



Probe set 


Symbol 


Description 


Location (Chr, 
Mb) 


Mean 
Expr 


Max 
LRS 


1435883 


at 


AW413431 


expressed sequence AW413431 


Chr8: 


ing 3741 qi 

\ \J J — ) / " 1 7£ 


8,27 


37 


1436986. 


.at 


Sntb2 


syntrophin, basic 2 


Chr8: 


109.537595 


6,94 


23,6 


1437003. 


.at 




kikln cuna j/oU4iyiuy gene 


Chr8: 


109.543026 


9,79 


24,6 


1451052. 


.at 


Cog8 


component of oligomeric golgi complex 8 


Chr8: 


109.570082 


10,35 


23,7 


1417766. 


.at 


1810044O22Rik 


RIKEN cDNA 1810044022 gene 


Chr8: 


109.710789 


11,80 


38 


1429725. 


.at 


Atbfl 


AT motif binding factor 1 


Chr8: 


111.481987 


8,84 


70,1 


1453393. 


_a_at 


Chst4 


carbohydrate (chondroitin 6/keratan) sulfotransferase 4 


Chr8: 


112.553165 


7,33 


71,9 


1427513. 


.at 


Nudt7 


nudix (nucleoside diphosphate linked moiety X)-type motif 7 


Chr8: 


1 16.678269 


6,95 


22,7 


1446412. 


.at 


Wwox 


WW domain-containing oxidoreductase 


Chr8: 


117.339587 


7,46 


94,7 


1444073. 


.at 


Maf 


avian musculoaponeurotic fibrosarcoma (v-maf) AS42 oncogene 
homolog 


Chr8: 


118.225461 


7,93 


121 


1449964. 


_a_at 


Mlycd 


malonyl-CoA decarboxylase (test Mendelian in BXDs with high DBA/2J 
allele) 


Chr8: 


121.934407 


9,63 


34,8 


1418856. 


_a_at 


Fan co 


Fanconi anemia, complementation group A 


Chr8: 


125.792224 


7,98 


78,5 


1460109. 


.at 


D8Ertd325e 


DNA segment, Chr 8, ERATO Doi 325, expressed 


Chr8: 


125.915951 


7,60 


89,8 


1449307. 


.at 


Dbnddl 


dysbindin (dystrobrevin binding protein 1) 


Chr8: 


126.029666 


7,14 


24,1 


1446982. 


.at 


Pord3 


par-3 (partitioning defective 3) homolog (C. elegans) 


Chr8: 


130.036847 


8,02 


87,9 


1448944. 


.at 


Nrpl 


neuropilin 1 


Chr8: 


131.027919 


11,95 


42,4 



For each gene, only the highest LRS is shown. Mean Expr: mean expression in lung for BXD strains. 



presence of pairs of regulated genes and their corre- 
sponding eQTLs predicts possible regulatory interac- 
tions and will allow searching for yet unknown 
regulatory networks. 

Discussion 

Here, we performed global gene expression profiling in 
eight inbred mouse strains and a cohort of BXD recom- 
binant inbred strains from whole lung tissues. Our stu- 
dies identified several lung-specific genes, large 
variations in gene expression levels, and a strong herit- 
ability in many gene expression traits. Correlation analy- 
sis of gene expression and genotypes identified potential 
gene interaction networks, pairs of trans- and cis- 
eQTLs, and genes with ds-eQTLs that may represent 
candidate genes involved in susceptibility to respiratory 
infections. In addition, one specific gene interaction 
pathway was identified in which Ahr regulates the 
Cyplal gene. 

Using tissue correlations of gene expression patterns 
across the BXD strains, we identified 16 genes with a 

Table 6 Amount of c/s- and trans-regulated transcripts for 



different significance thresholds 

Threshold (LRS) No. of c/s eQTLs No. of trans eQTLs 

12 5,214 15,485 

16 4,391 3,149 

20 3,666 536 

30 2,500 48 

50 1,332 15 



highly restricted expression in the lung of which 14 
could be validated by comparison to the BioGPS data- 
base [19]. The second most strongly expressed gene in 
the lung tissues was Sftpc which has been shown to play 
a role in lung development and the prevention of pneu- 
monitis and emphysema [32,33]. Also, Sftpc deficiency 
increases the severity of respiratory syncytial virus- 
induced pulmonary inflammation [34]. Furthermore, 
Scgblal and Ager were amongst the five most strongly 
expressed genes. Scgblal is expressed in lung clara cells 
and its deficiency results in enhanced susceptibility to 
environmental agents [35]. Scgb3al (secretoglobin, 
family 3A, member 1) and Scgb3a2 (secretoglobin, 
family 3A, member 2) were shown by others to be 
highly expressed in the lung and lower levels in other 
organs [36]. Scgb3a2 is down-regulated in inflamed air- 
ways [37] and plays an important role in lung develop- 
ment [38]. Sftpb (surfactant associated protein B (non- 
ciliated bronchiolar and alveolar type 2 cell signature) is 
a hydrophobic peptide which enhances the surface prop- 
erties of pulmonary surfactant and is expressed in non- 
ciliated bronchiolar and aleveolar type 2 cells [39]. 
Maintenance of Sftpb expression is critical for survival 
during acute lung injury [40] and reduction of alveolar 
expression causes surfactant dysfunction and respiratory 
failure [41]. Plunc (palate, lung, and nasal epithelium 
carcinoma associated) is expressed in the oral, lingual, 
pharyngeal and respiratory epithelia [42] and members 
of the Plunc gene family are thought to pay a role in the 
innate immune response [43]. The presence of Plunc 
protein in the lung decreases the levels of Mycoplasma 
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QTL location (GMb) 

Figure 6 Genome-wide graph of c/s- and frans-eQTLs. The positions of the eQTLs are plotted against the locations of the corresponding 
transcript along the genome. C/s-regulated genes are located at the diagonal, all other dots represent trans-regulated genes. The significance 
level of each QTL is indicated by the color. 



pneumoniae and its levels are reduced in allergic inflam- 
matory conditions [44]. Thus, the lung data set allowed 
us to find important genes that are expressed primarily 
in the lung and are important for lung homeostasis and 
prevention of disease. 

It should be noted that our analysis of genes with 
"restricted expression to the lung" is not ex-clusive; it 
only refers to the tissues that are represented in Gene- 
Network and BioGPS. Also, the analysis performed here 
should not be considered to be comprehensive. More 
sophisticated approaches may be employed to identify 
additional genes which also fulfill the criterion of "lung- 
restricted" expression. 

Furthermore, genes may not be apparent in the lung 
transcriptome because they are expressed only in a 
small fraction of cells within the lung. This issue of dilu- 
tion of expression signals is an important one and we 
have studied it in several tissues with considerable care 
(eye, retina, and numerous brain regions) using the 
same genetic methods and the same array platform. We 
were consistently able to detect expression of genes that 
are only expressed in very small cell subpopulations 



(<0.1%) such as rare amacrine cell subclasses in the 
retina [8] or very rare oxytocin-expressing neurons 
(<2000) in whole brain samples. The reason for the 
increased sensitivity is that with such large sample sizes 
(-70 lung arrays) the signal-to-noise ratios are much 
better than standard studies using Affymetrix arrays. 
These stuides typically use far fewer arrays and do not 
use genetic methods to "validate" the source of signal. 

The strong signal for hemoglobin and lymphocyte- 
specific genes clearly showed that gene ex-pression pat- 
terns of circulating blood cells are readily detectable in 
the lung transcriptome. This raises the question if an 
organ should be studied with or without containing 
blood. The correct answer to this question depends of 
course on the particular circumstances. However, we 
feel strongly that a global systems and genetic approach 
requires the analysis of the entire organ. The expression 
of genes is not cell-autonomous and depends on cellular 
micro envi-ronment, physical factors (gas pressure and 
gradients, etc), pathogen exposure, and many types of 
interactions. These factors also influence the expression 
of genes in blood cells. Therefore, we think that it is 
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Megabases 

Figure 7 Examples for variations of expression levels in different BXD and inbred mouse strains and ge-nome-wide analysis of cis- 
and frans-eQTLs. (A) Log 2 expression levels for Angl and (B) corresponding c/'s-eQTL signal. The numbers at the top are chromosomes. The 
blue line represents the significance level of the QTL expressed as LRS score (likelihood ratio statistic). A positive additive coefficient (green line) 
indicates that DBA/2J alleles increased trait values. A negative additive coefficient (red line) indicates that C57BL/6J alleles increased trait values. 
The two horizontal lines mark the genome-wide significance levels at p < 0.05 (red line) and p < 0.63 (gray line). Angl is located on Chr 14 
(triangle) and the QTL peak is at the same location as the gene. (C) Log 2 expression levels for Cyplol and (D) corresponding trons eQTL peak. 
Cyplal is on chromosome 9 (triangle) and the QTL was found on chromosome 12. 
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Figure 8 The transcription factor Ahr is located within the frans-eQTL region for Cyplal. The strongest eQTL for Cyplol maps to 
Chromosome 12 and the QTL peaks between 36 and 37 megabases. The gene Ahr, indicated by an arrow, is located exactly at the top of the 
QTL peak. 



Alberts et al. Respiratory Research 201 1, 12:61 
http://respiratory-research.eom/content/1 2/1 /61 



Page 13 of 15 



imperative to look simultaneously at all cells in a func- 
tion unit: in this case the whole lung plus its containing 
blood. 

In conclusion, the combined analysis of expression 
levels and correlations in a variety of tissues tissue 
allowed us to determine genes with restricted or prefer- 
ential expression in the lung. For several of these genes, 
an important function in the lung has been described 
and the same may be assumed for the others. This 
information will also contribute to a better understand- 
ing of the biological function of these genes. 

Many phenotypic traits have been studied for the BXD 
mouse populations and several QTLs were identified 
which influence diseases or vulnerability in the lung. 
The detection of cis eQTLs in the very same tissue is 
one method to identify potential candidate genes under 
the QTL which may causally influence the trait. Here, 
we investigated two traits in more detail, susceptibility 
to influenza virus and susceptibility to mycoplasmosis. 
Several c/s-eQTLs were found in the corresponding 
QTL regions and in each case, genes could be identified 
with a presumed role in the host immune defense (dis- 
cussed already in the results section). Thus, the study of 
ds-eQTLs in our data set may provide valuable candi- 
dates for other quantitative trait genes that influence 
important lung phenotypes. Furthermore, we found 13 
BXD lines with low expression signals for Krt4, Krtl3 
and Krtdap. Krt4 and Krtl3 have been shown to be 
responsible for White sponge nevus (WSN), also known 
as Cannon's disease, which is an autosomal dominant 
skin condition in humans [45-47]. We propose that the 
13 mouse strains have genetic alterations which result 
in low transcript levels of these genes and they may 
represent a good model for Cannon's disease. It should 
be noted, however, that no ds-eQTLs found were found 
for any of the Krt genes. 

We also identified a set of genes for which the expres- 
sion levels correlated highly with members of the Kir 
gene family. Klra3 and Klrgl are killer cell lectin-like 
receptors that are exclusively expressed on natural killer 
cells (NK cells). NK cells form a major component of 
the innate immune system and kill cells by releasing 
small cytoplasmic granules of proteins called perforins 
and granzymes [48]. Both Gzma and Prfl were in the 
gene network that we identified. In addition, correla- 
tions can also be used to expand already known gene 
networks in specific cell populations. When starting 
with the Cd3 T cell marker and calculating correlations 
with all other transcripts measured, we identified a 
strongly correlated network of genes, in which most of 
the genes were known as T cell markers or to be 
involved in T cell activation or homeostasis. In a similar 
way, when starting with the Cdl9 B cell marker, we 
could identify a strongly correlated network of B cell 



signature genes. The analysis of these T cell and B cell 
expression signatures in the Bi-oGPS data base with 
expression profiles in mouse tissues revealed that indeed 
>90% of the T and B cell markers were specifically 
expressed in either T or B cells. Furthermore, most of 
the T and B cell signature genes represented genes with 
known function in B and T cell differentiation, activa- 
tion and homeostasis. For example, the T cell signature 
included genes encoding subunits of the T cell receptor: 
Cd3d (CD3 antigen, delta po-lypeptide), Cd3g (CD3 
antigen, gamma polypeptide), Tcra (T-cell receptor 
alpha chain) and Tcrb-V13 (T-cell receptor beta, variable 
13) and Lat (linker for activation of T cells) which are 
involved in T cell activation. The B cell signature con- 
tained components of the B cell antigen receptor com- 
plex, Cdl9 (CD19 antigen) and Cd79a (CD79A antigen 
(immunoglobulin-associated alpha)), as well as Elk (B 
lymphoid kinase) tyrosine kinase which is associated 
with the receptors. Also, the correlations for both signa- 
tures in the spleen expression data set in GeneNetwork 
could indeed confirm that the signatures were strongly 
correlated (data not shown). In summary, these studies 
demonstrate that correlation analyses are able to identify 
genes which very likely interact in a common network 
or biological process. The approach used here may thus 
have a great potential to identify new networks and bio- 
logical processes in the lung. In addition, starting with a 
known bona-fide cell-specific gene and then analyzing 
gene expression values across strains, it is possible to 
identify a set of highly correlated genes. These gene sets 
genes can now be used as cell-specific signature genes 
in complex transcriptome studies, e.g. to detect infiltrat- 
ing immune cells in the lungs after infection. 

The genetic mapping of lung expression profiles 
revealed many cis- and trans-eQTLs, indicating that 
many gene expression patterns in lung have a strong 
genetic component. Trans-eQTLs allow the identifica- 
tion of gene-gene regulatory networks. As an example, 
we found that the transcription factor Ahr was present 
in a trans-eQTL region detected for the Cyplal gene. 
Ahr is a transcription factor known to induce Cyplal 
transcription levels after ligand binding [49-51]. Six 
binding sites for the Ahr receptor ligand have been 
revealed in the 700-basepair DNA domain upstream of 
Cyplal [52]. However, a critical leucine-to-proline sub- 
stitution in Ahr results in a 15 to 20-fold reduction in 
the binding affinity of the proline variant found in DBA/ 
2J compared to the leucine variant found in C57BL/6J 
[53]. Indeed, in our data set, expression values for 
Cyplalwere low for BXD strains carrying the DBA/2J 
allele at the Ahr locus and high for the strains carrying 
the C57BL/6J allele. Since Ahr is not ds-regulated in 
lung, the downstream effects appear to be only caused 
by changes in Ahr protein binding affinity. Although the 
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interaction between Cyplal and Ahr as such is not a 
new finding, it is quite remarkable that the interaction 
becomes apparent in lungs which were not exposed to 
an inducing xenobiotic. Furthermore, we do not see this 
relationship in several other tissues, such as liver. There- 
fore, our observation suggests that in the lung, which is 
potentially exposed to many xenobiotics, the Ahr recep- 
tor may always be activated at a low level. Alternatively, 
Ahr expression may be stimulated by yet unknown 
ligands that are also present under normal environmen- 
tal conditions. 

Conclusions 

Here, we showed that whole genome expression analysis 
of the lungs from a large set of strains of the BXD 
mouse population can be exploited to identify important 
gene regulatory networks. We found a large number of 
expression correlations and QTLs which can be further 
investigated to better understand molecular interaction 
networks in the lung. The search for ds-eQTLs in geno- 
mic intervals that were identified previously as QTLs for 
infectious diseases revealed several quantitative trait 
candidate genes. In addition, we demonstrated that the 
analysis of gene expression correlations, starting with a 
few cell-specific genes, could identify a larger set of 
genes which allows detecting the presence of B and T 
cells within the transcriptome of the whole lung. Such 
expression signatures will be very important to follow 
normal and abnormal host responses during infections 
and other diseases of the lung. 
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